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We review a derivation of the numbers of RNA complexes of an arbitrary topology. These numbers 
are encoded in the free energy of the hermitian matrix model with potential V(x) = x 2 /2 — six/ (1 — fx), 
where s and t are respective generating parameters for the number of RNA molecules and hydrogen 
bonds in a given complex. The free energies of this matrix model are computed using the so-called 
topological recursion, which is a powerful new formalism arising from random matrix theory. These 
numbers of RNA complexes also have profound meaning in mathematics: they provide the number of 
chord diagrams of fixed genus with specified numbers of backbones and chords as well as the number 
of cells in Riemann's moduli spaces for bordered surfaces of fixed topological type. 

In this paper, we review a solution to the following problem studied and solved in pQ. Let us consider a complex 
consisting of an arbitrary number of RNA chains with various nucleotides connected by Watson-Crick bonds both 
intra- and inter-chain. We seek the number of such topologically inequivalent complexes consisting of a given number 
of RNA chains and bonds. To explain the topology, it is useful to rephrase this problem in terms of the so-called 
chord diagrams. Chord diagrams are comprised of a number of so-called backbones, which are represented by disjoint, 
oriented and labeled intervals lying along a fixed line, that are connected by so-called chords taken as semi-circles 
lying above this fixed line. Each backbone is identified with the sugar-phosphate backbone (hence the terminology) of 
a single RNA molecule oriented from its 5' to 3' end. Chords correspond to Watson-Crick basepairs, where we add an 
associated chord taking care its endpoints in each backbone occur in the correct order corresponding to the primary 
structure, i.e., the word in the four- letter alphabet of nucleic acids that determines the RNA molecule. Endpoints of 
chords lie at distinct interior points of the backbones. In this way, a complex of interacting RNA molecules determines 
a chord diagram. We assume that nucleotides not participating in basepairs play no role in this model, i.e., there are no 
isolated vertices, and that chord diagrams are connected, i.e., the corresponding RNA complexes are also connected. 
Now, distinct isomorphism classes of chord diagrams correspond to the topologically inequivalent configurations we 
wish to count. An example of an RNA configuration (consisting of a single RNA chain) and the corresponding chord 
diagram (on one backbone), are shown in Fig. [I] (this figure is borrowed from [2]). We also stress that the results 
presented in what follows can be applied to various complexes of (bio)polymers, not necessarily RNA complexes. 
However, for definitcness, we arc going to discuss these results just in the RNA context. 

It turns out that the counting of RNA complexes, or chord diagrams, can be refined by introducing an additional 
parameter, which naturally characterizes the topology of these structures. This parameter arises as the genus g > 
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FIG. 1: RNA complex (above) can be uniquely represented as a chord diagram (below). Each RNA chain (just one in the 
example in the figure) corresponds to a single backbone (a horizonatal interval in the chord diagram), and each hydrogen bond 
is represented by a chord (a semi-circle with ends attached to a backbone). 



of an auxiliary topological surface with boundary which can be associated to a given chord diagram by thickening 
its backbones and chords, as shown in Fig. [2] This surface has some number r > 1 of boundary components. If we 
denote the number of backbones and chords respectively by b and n, then the Euler characteristic of these auxiliary 
surface can be expressed as b — n = 2 — 2g — r; this relation can be used in particular to determine the genus g. For 
example in fig. [2j we have 6 = 2 backbones, n = 4 chords, and r = 4 boundary components, which implies that the 
corresponding surface has genus g — 0. Let us remark parenthetically that [2] provides a context-free grammar and 
polynomial algorithm for computing minimum-free energy configurations of a single RNA molecule while allowing 
for certain pseudo-knot patterns of arbitrarily high genus that arise by suitably iterating genus one; since the c gj b{n) 
grow rapidly in g or b for n in the appropriate range, it is unlikely that a similar higher-genus approach is reasonable 
short of a full-blown and as-yet-unknown field theory for RNA. 

To sum up, a given RNA complex or chord diagram is characterized by its number of backbones 6, its number of 
chords n, and its genus g. Let us denote the number of topologically inequivalent such diagrams by c g _b{n). These are 
the numbers we wish to determine. In what follows, we explain how to determine the following generating functions 
of these numbers 

C B ,b(z) = c 9^ n ) z "> for 9 > 0, (1) 

n>0 

using random matrix theory, and in particular, the so-called topological recursion. Before we present how these 
generating functions can be found, let us provide some examples. In the special case of one backbone and genus 
g = 0, it is not hard to see that 00,1(71.) are Catalan numbers with the generating function 



Co i(z) = 1 ^ 4Z = 1 + z + 2z 2 + 5z 3 + Uz 4 + ... (2) 
2z 

so that Co, i(0) = Co, i(l) = 1, Co. i(2) = 2, Co,i(3) = 5, and so on. More generally, configurations which involve only one 
backbone can be enumerated by the simplest matrix model with the quadratic (Gaussian) potential [3J. However this 
Gaussian model cannot describe configurations involving many backbones, and constructing a model which works for 
many backbones is an important motivation for our work; we describe this new model below. Using this new model, 
we find for example that generating functions of diagrams in genus zero and one on four backbones with an arbitrary 
number of chords take the form 



co, 4 (,)^ 24 - 3( ; i + 1 4 8 ;; 8 - 2) ^2,3 + 2448 , 4+ ... 

C M = 24 ^ 715 + 755 ^ = 1716LV> 



(3) 
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This means that cq,4(3) = 72, Co,4(4) = 2448, for example. While computation of these numbers by explicit enumera- 
tion of chord diagrams is possible for relatively low and fixed g, b and n, as illustrated, e.g., in the appendix in PQ, it 
quickly becomes involved. On the other hand, the random matrix theory allows us to determine generating functions 
C g ^{z) in an algorithmic way, in principle for all g and b. As yet another example, for 4 and 5 backbones, in genus 2 
and 3, we find the following generating functions 
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FIG. 2: Each RNA complex uniquely determines a chord diagram consisting of b backbones and n chords (left). By thickening 
backbones and chords we obtain a surface (right) with r boundary components. The genus g of this surface provides an important 
characteristic of the corresponding chord diagram, and it can be determined from the Euler relation b — n — 2 — 2g — r. Our 
task is to determine the numbers c Sl i,(n) of topologically inequivalent chord diagrams on b backbones, with n chords, and 
characterized by genus g. 

We can explain now how the above generating functions can be determined from the random matrix theory. As 
shown in [I,, all chord diagrams can be enumerated by the following matrix model, i.e., an integral over hermitian 
matrices H of size N, 



Z = DH e 



-NtrV(H) _ 



exp 



-N 2 s- 



2a F n 



(5) 



where the so-called matrix model potential is given by 



V(x) = 



stx 
1-tx' 



(6) 



These expressions should be understood as follows. The integral in ([5| with the potential given in ^ arises from 
the combinatorial analysis of chord diagrams, and its form was found in [1] using well-known techniques in random 
matrix models, analogous, e.g., to those used in 0]. In the limit of large matrix size N — > oo, the free energy of this 
model F — log Z has the genus expansion in N given in the exponent on the right side of ([5| (the extra summand 
—N 2 s in the exponential in ^ is purely for convenience). In general, finding the free energy of a matrix model of the 
above form (with arbitrary potential V{x)) is a very difficult task. On the other hand, such free energies F g , for the 
potential given in ([6]), are essentially the objects we are after. Namely, from the combinatorial description discussed 
in [T] , it follows that the free energies F g encode the generating functions ([!]) 



F g (s, t) = const - 



Y- 



C g At 2 ), for g > 0, 



(7) 
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where the constant terms reproduce the free energies 2g (2g-2) °^ ^ ne Gaussian matrix model (characterized by the 
quadratic potential V(x) — \x 2 ) 1 where Bi g denote Bernoulli numbers. The extra factor 6! arises because c g ^(n) 
counts chord diagrams with labeled backbones as opposed to unlabeled ones as arise in the matrix model description 
(and where a permutation of backbones must nevertheless preserve backbone orientations). We also see that the genus 
expansion into free energies F g agrees with the genus g of the auxiliary surfaces described earlier. 

The problem of enumerating of RNA complexes therefore reduces to the problem of performing the matrix integral 
and determining free energies F g in (j5|). As we already mentioned, in general this is a very difficult problem; however, 
recently a beautiful algorithmic solution to this problem has been given. To find free energies one should solve the 
so-called loop equations of the matrix model, which are equations (known as Ward identities in quantum field theory) 
satisfied by certain multi-linear correlators Wn (pi, ■ • ■ ,p n ) in this model. The leading order equation among those 
identities specifies a so-called spectral curve, i.e., an algebraic curve which characterizes distribution of eigenvalues 
in the matrix model in the N — > oo limit. It also turns out that all correlators Wn (pi, ■ ■ ■ ,p n ) an d loop equations 
they satisfy can be encoded entirely in terms of this spectral curve. These loop equations can be solved in a recursive 
way [HEI, and in this manner, free energies F g (for g > 2) are completely determined by correlators w[ 9 \p). This 
entire procedure requires just the knowledge of the spectral curve, which can be regarded as the initial condition of 
the recursion (and a universal form of the solution to loop equations) and no other details of a matrix model from 
which this curve was derived. An important achievement of Eynard and Orantin [7] was to realize that one can use 
the recursive solution of loop equations to assign correlators Wn\pi, ■ ■ ■ ,p n ) and F g to an arbitrary algebraic curve, 
not necessarily of matrix model origin. On the other hand, it is guaranteed that F g computed for the spectral curve 
of a matrix model reproduce the free energies. 

In order to solve the matrix model ^ with the potential ^ , we can therefore use the formalism of this topological 
recursion. This has indeed been done in pQ, and the main steps of this solution are as follows. First, we need to 
determine the spectral curve of the model (JsJ) . This can be done by the analysis of a distribution of eigenvalues in the 
large N limit. Because the potential ^ is a deformation of the quadratic function, it has a single minimum, and in 
the equilibrium configuration eigenvalues spread around this minimum. For large N the eigenvalues are distributed 
along an interval with end-points a and b, which defines a cut in a certain auxiliary complex plane. Such a one-cut 
solution defines the corresponding spectral curve which has genus zero, and it turns out to be given by the following 
algebraic equation for two complex variables x and y 

- l) 4 = (x-a)(x- b) ((fa - 1 + i^±3l) 2 + 7 ) 2 , (8) 

where 

(at + bt) ((at) 2 + (bt) 2 + H(at + bt- abt 2 ) - 16) 

7 " 16(at + bt-2) ' (9) 

While the end-points of the cut a and b cannot be given in a closed form, it can be found that they are determined 
by the following system of equations 



= a + b + 



st(at+bt-2) 



\ 3/2 > 



((ot-l)(6t-l)) 

_ , , . 2 , 4»((2- l -^P±)(at+bt-2)+2abt 2 -3t(a+b)+4) V 10 ) 

lb — [a — o) H - \3T2 ■ 

((ot-l)(6t-l)) 

From the knowledge of the curve ([8]) and the formalism of the topological recursion we can now determine F g for 
g > 2 (Fq and F\ must be determined separately, independently of the recursion). In particular, we find the following 
exact result for the free energy at genus 2: 



2405 4 (1 -d-Acr + 3ct 2 ) 5 (1 + S - Aa + 3ct 2 ) 5 
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where 

(a + b)t (a - b)t 



s= y -^- (11) 



We also obtain an exact result for the free energy F3 which is yet more complicated, with its precise form given in pQ. 
Expanding these results in the form given in and using the perturbative expansion of a and b in s which follows 
from (10), we can determine appropriate generating functions C g ^(z), such as those given in Q. This procedure can 



be continued in an algorithmic manner, and with sufficient computational power, one can determine exact forms of 
F g for any g, and so the corresponding Cg^(z), and finally all c g< b(n). 

To sum up, we have shown how random matrix theory and the topological recursion can be used to enumerate 
topologically inequivalent RNA complexes. This result opens many other perspectives and research possibilities. On 
one hand, one can consider asymptotics of the numbers c ffj b(n) which we have found, and analyze their statistical 
properties. One can also compare these theoretical predictions with the structure of RNA configurations observed 
in Nature. In our discussion, a relation to matrix models arises naturally if we translate RNA configurations into 
the form of chord diagrams, and therefore c 9) b(n) at the same time count the number of such diagrams. Such chord 
diagrams arise in many other problems in mathematics and physics (e.g., in knot theory or algebraic geometry) and 
our results should shed new light on those other fields as well. In particular, a simple transform of the numbers c gt b{n) 
count a sub-class of chord diagrams called "shapes" , which give the number of cells in the ideal cell decomposition 
[8] of Riemann's moduli space for a surface of genus g > with b > 1 boundary components provided 2g + b > 2. 
The computation reported here is therefore at once of significance in computational biology and in geometry, and 
represents a remarkable confluence of biology, mathematics and physics. 
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